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Summary 

The  Onto  Agents  project  was  part  of  the  DARPA-sponsored  DAML  effort  (BAA  00-07). 
Our  OntoAgents  project  started  1  July  2000  and  terminated  as  of  3 1  Dec  2004.  It  was 
monitored  by  the  Air  Force  Rome  Laboratories  (AFRL/IFSA,  525  Brooks  Road,  Rome, 
NY  13441-4505),  the  Air  Force  account  is  F30602-00-2-0594. 

The  cognizant  Rome  Laboratory  staff  are  Nancy  Koziarz  (Nancy.Koziarz@rl.af.mil)  and 
Mark  J.  Gorniak.  DARPA  program  management  included  Jim  Hendler  and  Murray 
Burke. 

The  Objective  of  OntoAgents  project  was  to  develop  concepts  and  modules  that  can  serve 
as  an  ontology-driven  'Food  Chain’  for  Advanced  Applications  on  the  Web. 

Personnel 

The  principal  investigator  at  Stanford  was  Prof.  Gio  Wiederhold  and  the  principal 
scientific  assistant  project  manager  was  Stefan  Decker. 

Gio  Wiederhold  retired  formally  in  July  2001,  but  maintained  responsibility  for  academic 
achievements,  while  recalled  to  25%  active  duty  for  teaching  and  research.  In  July  2002 
Stefan  Decker  and  the  principal  focus  of  the  project  moved  to  the  Information  Science 
Institute  (ISI)  of  the  Univ.  of  Southern  California  (USC). 

Until  the  summer  of  2002  Stanford  had  a  subcontract  with  Karlsruhe  (Prof.  Rudi  Studer). 
After  that  date  we  had  a  subcontract  for  ongoing  work  with  USC  ISI  (Stefan  Decker) 
(ending  3 1  March  2004).  USC  also  took  over  the  Karlsruhe  contract  at  that  time.  The 
Stanford  extension  beyond  1  April  2004  was  to  allow  a  student  to  complete  his  PhD 
thesis  on  the  Ontology  algebra. 

We  list  the  people  that  participated  below,  with  their  academic  achievements  during  the 
project  (in  parenthesis)  and  their  current  positions.  An  asterisk  (*)  indicates  that  they 
received  financial  support  from  the  OntoAgents  project. 

1 .  Prof.  Gio  Wiederhold,  PhD,  Principal  Investigator  *  (Retired)  Recalled  for  active 
duty  to  teach  the  Freshman  course:  Business  on  the  Internet;  consulting  with  MITRE 
Corp. 

2.  Prof.  Rudi  Studer,  PhD,  Principal  Investigator,  subcontract,  Professor,  University  of 
Karlsruhe. 

3.  Mark  Musen,  Phd,  MD,  Co-Investigator  (2001)  Professor,  Director  Section  of 
Medical  Informatics,  Stanford  University 

4.  Stefan  Decker,  Project  Leader  *  (PhD,  Karlsruhe,  January  2002)  Research  Staff, 
Digital  Enterprise  Research  Institute  (DERI)  Galway,  European  Semantic  Web 
Research  Center  and  Nat.Univ.  of  Ireland. 


5.  Steffen  Staab,  PhD,  Project  Leader  *(Habilitation,  Karlsruhe,  2000),  Professor,  AI 
Institute  (AIFB),  University  of  Koblenz -Landau,  Gennany, 

6.  Sasha  Buvac,  research  assistant  (PhD  2004)  Australian  National  University 

7.  Ray  Fergerson,  Information  Systems  Specialist  *  (2001-2002)  Stanford  Medical 
Informatics,  Protege  project. 

8.  Siegfried  Handschuh  *(PhD,  Karlsruhe,  February  2005)  AI  Institute  (AIFB), 
University  of  Karlsruhe,  Gennany, 

9.  Yuhui  Jin  *(MS,  honors,  Stanford  2003)  Technical  staff,  Amazon.com,  Seattle  WA. 

10.  Maarten  Kersten,  PhD  (2000,  Visiting  Researcher),  CWI,  Amsterdam,  Holland 

11.  Martin  Lacher,  graduate  researcher  *tuition  only  (2001,  Visiting  Researcher) 
Technical  Univ.,  Munich,  Germany 

12.  Sergey  Melnick,  graduate  researcher  *  (PhD,  Leipzig,  2003)  Microsoft  Research, 
Redmond  WA. 

13.  Prasenjit  Mitra  *  (2001,  PhD,  Stanford,  2004)  Assistant  Professor,  Penn  State  Univ., 
State  College  PA. 

14.  Natalya  Fridman  Noy,  PHD  (2003-2004)  Senior  Research  Associate,  Stanford 
Medical  Informatics,  Protege  project. 

15.  Sichun  Xu,  graduate  research  assistant  *(2001,  CS  MS  2002)  Ebay  Corporation,  CA. 

16.  Fernando  Arguello,  assistant  (BS  2002,  Santa  Clara  Univ.;  participant  in  the  Stanford 
SURF  outreach  program)  Now  at  IBM  Poughkeeps  zXML  group,  NY. 

Introduction 

A  notable  aspect  of  the  Onto  Agents  project  is  the  broad  interaction  that  it  enabled  among 
European  and  American  researchers.  As  such  it  brought  together  extant  and  continuing 
research  on  the  formal  approaches  to  knowledge  management,  the  pragmatic  background 
of  Expert  systems  approaches,  and  the  concerns  for  scalability  from  database 
technologies. 

Having  a  fonnal  underpinning  in  complex  projects  is  essential  for  reliability, 
maintainability  to  enable  a  long  life,  and  scalability.  Dealing  with  pragmatic  issues  is 
essential  in  dealing  with  practical  situations,  as  heterogeneous  data,  autonomous 
participants,  and  effective  perfonnance.  One  example  of  attempting  to  bridge  the  gap  is 
the  proposal  for  Description  logic  programs:  combining  logic  programs  with  description 
logic  (DL)  [GrosofHVD:03].  However,  that  combination  only  addresses  the  lowest  level 
of  DL  proposed  in  the  DAML  setting.  Another  aspect  is  the  concept  and  demonstration 
of  an  Ontology  algebra.  Such  an  algebra  permits  the  interoperation  of  multiple, 
independently  developed  ontologies  to  interoperate  in  focused  applications.  When  source 
ontologies  change  (as  they  will),  the  application  ontology  can  be  rapidly  adapted  using 
the  existing  algebraic  formulation. 
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We  do  not  claim  that  we  solved  these  issues  with  finality.  The  tension  between  formality 
and  scruffiness  has  been  an  issue  in  Artificial  Intelligence  since  its  inception,  and  will 
continue  to  hinder  progress.  The  complexity  of  semantics  is  without  bound,  and  progress 
will  only  uncover  new  depth  that  warrant  research.  We  can  only  claim  to  have  tried  to 
make  the  semantic  web  community  aware  of  the  issues  and  provided  constructive  and 
well-founded  directions. 

Our  vision  was  published  as  "An  Information  Food  Chain  for  Advanced  Applications  on 
the  WWW"  [DeckerJMSW:00].  The  diagram  copied  below  depicts  the  approach  and  the 
different  project  parts.  We  will  follow  the  process  in  our  exposition. 


Ontology  Construction 
Tool 


Ontology  Articulation 

T  ookit 


Web-Page  Annotation 
Tool 


Metadata 

Repository 


Community 

Portal 


Inference  Engine 


Figure  1  The  semantic  Web  Foodchain 


Methods,  Assumptions,  and  Procedures 

Annotation 

To  locate  relevant  pages  on  the  semantic  web  they  have  to  be  annotated.  Documents 
containing  semantic  annotations  enable  a  more  precise  semantic  search  and  allow  for 
interoperation.  These  benefits,  however,  come  at  the  cost  of  an  increased  authoring  effort. 
In  our  work  we  have,  therefore,  presented  a  comprehensive  framework  which  support 
users  in  dealing  with  the  documents,  the  ontologies  and  the  annotations  that  link 
documents  to  ontologies. 
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Manual  annotation  is  tedious,  and  often  done  poorly.  Even  within  the  funded 
DAML  project  fewer  pages  were  annotated  than  was  hoped.  In  eCommerce,  there  has  to 
be  a  sufficient  business  motivation  to  perform  annotations,  in  then  scientific  world  the 
motivation  is  less;  although  having  the  right  tools  will  help  [NoySDCFM:01].  Given  the 
problems  with  syntax,  semantics  and  pragmatics  with  annotation  we  identified  the 
requirements  of:  consistency,  proper  reference,  avoidance  of  redundancy,  relational 
metadata  maintenance,  ease  of  use  and  efficiency  [HandschuhSM:01]  [CimianoHS:04]. 

Our  work  focused  on  methods  to  automate  the  annotation  process 
[HandschuhS:03],  using  existing  sources,  as  ontological  knowledge  [SureS:02],  relational 
metadata  [HandschuhS:02],  [HandschuhSb:03]  ,  digital  libraries  [MelnikGP:00],  and 
other  legacy  data  [VolzHSS:04],  We  provide  a  comprehensive  and  pioneering 
annotation  framework  that  reduces  the  complexity  of  Semantic  Annotation  for  the 
annotator.  The  framework  employs  a  comprehensive  set  of  modules  including  inference 
services,  crawler,  document  management  system,  ontology  guidance/fact  browser,  and 
document  editors/viewers.  Process  issues  pertaining  to  the  annotation/authoring  task  are 
modularized  from  content  descriptions  by  a  meta  ontology. 

The  framework  has  been prototypically  implemented  in  the  open  source  project  OntoMat  hosted 
by  the  DARPA  DAML  program  [http://projects.semwebcentral.org/projects/ontomat/].  The 
annotation  framework  is  populated  with  specialized  methods  for: 

{2.  Manual  Annotation:  The  transfonnation  of  existing  document  resources,  into  relatable 
knowledge  structures  which  represent  the  underlying  infonnation. 

Authoring  of  Documents:  Authoring  lets  users  create  metadata  with  little  added  effort, 
while  putting  together  the  content  of  a  page. 

{2.  Semi-automatic  Annotation:  Semi-automatic  Annotation  based  on  Information 
Extraction. 

la  Deep  Annotation:  Considers  Web  pages  which  are  generated  from  a  database  by 
annotation  of  the  underlying  database. 

The  size  of  the  deep  web  has  been  estimated  to  be  many  times  larger  than  the  shallow 
web,  the  directly  accessible  information  as  retrieved  by  tools  as  Google.  The  deep  web 
covers  the  information  dynamically  populated  from  databases,  as  typically  done  by 
business  services,  and  such  important  to  the  future  of  the  semantic  web 
[HandschuhSa:03]  [HandschuhSV:03a]  [HandschuhSV:03b]  [HandschuhSV:03],  Its 
effective  size  is  hard  to  measure,  since  the  same  database  —  say  stock  prices  —  van  be 
provided  by  multiple  services.  Measurements  of  the  deep  web  have  also  counted  the 
huge  volume  of  images  that  satellites  have  collected.  While  those  are  also  accessible  on 
the  web,  the  value  in  terms  of  actionable  infonnation  per  megabyte  stored  is  small.  But 
no  matter  what  the  size  metric  should  be,  dealing  with  deep  web  will  be  crucial,  and 
require  tools  that  are  linked  to  database  technology. 

OntoMat  is  the  reference  implementation  of  the  CREAM  framework 
[HandschuhSM:01]  [HandschuhSC:02j.  It  is  Java-based  and  provides  a  plug-in  interface  for 
extensions  for  further  applications.  It  has  been  used  in  several  cases,  e.g.  the  annotation 
of  paper  abstracts  for  the  International  Conferences  on  Semantic  Web  (ISWC  2002,  2003, 
2004)  by  each  of  the  authors.  Ontomat  is  in  use  on  class  room  machines  in  an  obligatory 
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Semantic  Web  course  for  infonnatics  students  in  Prague,  which  enrolls  some  250  people 
every  year  [http://nb.vse.cz/~svatek/modz.htm]. 


Ontologies 

Information  for  annotation  can  be  derived  from  many  sources,  as  discussed  above,  but 
require  tools  to  create  effective  ontologies  [MaedcheS:03]  [MaedcheNS:03] 
[OberleVSM:04]  [StaabEAD:01]  .  Automation,  using  AI  learning  technologies  is  one 
approach  [MaedcheS:01]  [MaedcheS:03].  Ontological  information  may  be  obtained  by 
inferencing  [SureAS:03]  [SureEASSW:02], 

Once  ontologies  are  established  they  have  to  be  maintained  [AbererEa:04].  The 
ontologies  can  be  stored  anywhere  on  a  dynamic  network  [NejdlEa:02],  or  on  a  grid 
[T  angmunarunkitDK:  03  ] . 

The  core  Protege  system  software  was  modified  to  support  the  development  of  RDF 
enhancements  to  Protege.  In  order  to  allow  ontologies  maintained  within  the  Protege 
system  to  interoperate  with  the  RDF  representation,  a  plugin  is  available  from  the  Protege 
web  site  [NoySDCFM:01]. 

Much  of  this  information  is  summarized  in  a  handbook,  to  which  most  OntoAgents 
project  participants  have  contributed  [StaabS:04].  A  future  research  challenge  is 
developing  support  systems  for  ontology  evolution  and  supporting  adaptation  of  the 
applications  that  use  those  ontologies,  when  the  ontologies  are  updated  [MitraWD:01] 
[01iver:00], 

Knowledge  Management 

Having  well  structured  and  focused  ontologies  provides  a  basis  for  organizing 
knowledge,  the  main  distinguishing  property  in  modern  organizations  and  businesses 
[StaabSS:02]  [StaabSSS:01]  [SureS:02]  [SureSS:02].  A  complementary  current 
approach  to  knowledge  management  are  topic  boards,  and  we  explored  that  relationship 
[LacherD:01].  Organizing  knowledge  encapsulated  in  governmental  regulations  is  a 
current  issue  as  well,  here  we  are  cooperating  with  a  project  in  Stanford's  department  of 
Civil  and  Environmental  Engineering  [LauKLW:04]  [LauLW:05]. 

The  commonality  that  can  be  achieved  is  still  unclear,  but  a  topic  of  continuing 
research  [BernsteinHJRWiOO]  [Melnik:03]  [MelnikRBa:03]  [MelnikRBb:03]. 

Infrastructure 

Web  services  are  expected  to  operate  in  a  widely  distributed  environment,  and  we 
interacted  with  and  supported  projects  that  focused  on  the  required  infrastructure 
[LiuPLW:04]  [MelnikD:00]  [MelnikGR:02],  The  scalability  of  these  systems,  while 
maintaining  correctness  is  a  major  concern,  as  expressed  in  a  workshop  that  was 
organized  by  ISI  colleagues  [VolzDC:03],  We  investigated  how  Semantic  Web  standards 
such  as  RDF  and  OWL  can  be  used  within  our  reasoning  language  TRIPLE  [SintekD:03] 
for  resource  matching  for  the  Grid.  Results  are  promising  and  have  spawned  follow  up 
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work  in  the  resource  matchmaking  area  using  rules.  The  second  topic  was  centered 
around  the  emerging  Grid  notion:  in  [TangmunarunkitDK:03]  and  [HarthDHTK:04]. 

Furthermore  we  were  involved  in  discussions  around  a  potential  rules  standard  for  the 
Semantic  Web  [HorrocksADKGW:03].  Extended  operators,  such  as  aggregation  in 
TRIPLE,  proved  to  be  necessary  for  matchmaking  applications.  TRIPLE  is  under 
continued  development  and  is  available  at  http://triple.semanticweb.org. 

Inferencing  Agents 

Application  of  the  knowledge,  through  agents  that  perfonn  reasoning  and  inferencing 
procedures,  is  central  to  the  promise  of  the  semantic  web.  As  implied  in  the  introduction 
to  this  section,  it  is  here  where  the  technologies  now  used  by  the  AI  community  need  to 
come  together.  Scalability  and  pragmatic  effectiveness  are  expected  in  the  semantic  web. 

Inferencing,  i.e.,  relating  the  knowledge  sentences  from  the  sources  to  achieve  higher 
level  goals,  is  needed  during  construction  of  onotogies  [SureAS:02]  as  well  as  during 
their  application  [NoySDCFM:01].  Work  within  OntoAgents  has  focused  on  TRIPLE 
[DeckerS:02]  [SintekD:03],  which  shares  the  RDF  and  OWL-DL  knowledge 
representations  with  other  DAML  projects.  TRIPLE's  Hom-logic-based  approaches  have 
been  applied  to  representations  used  for  description  logics  [GrosofHVD:03].  That  work 
identified  the  common  intersection  between  Logic  Programming  languages  and 
Description  Logic  languages,  and  dubbed  it  Description  Logic  Programming.  We  showed 
that  a  large  part  of  a  language  such  as  OWL  or  DAML+OIL  can  be  captured  within  that 
Description  Logic  Programming  framework,  which  allows  for  efficient  reasoners  for 
these  language  subsets.  That  work  is  now  widely  cited  and  used  in  follow  up  work. 

An  underlying  issue  is  how  demanding  the  applications  of  the  semantic  web  will  be. 
If  use  is  no  more  complex  than  seen  in  the  common  search  models  today,  available 
technologies  will  provide  adequately  broad  information,  but  not  avoid  the  dreaded 
information  overload.  Any  excess  or  wrong  infonnation  must  now  be  filtered  out  by  the 
end-users.  Annotation  will  improve  that  filtering  somewhat  [AgarwalHS:03],  But 
semantic  web  proponents  expect  a  much  greater  level  of  automation.  For  routine  business 
applications  filtering  has  to  be  carried  without  user  participation.  More  complex,  multi¬ 
service  applications  require  a  greater  depth  of  inferencing  to  obtain  adequate  information; 
but  filtering  of  mismatches  is  essential  to  avoid  overload.  The  optimal,  or  at  least 
effective  tradeoff  between  missing  some  information  and  receiving  excessive  junk  must 
be  based  on  a  situational  criterion,  that  balance  warrants  fonnal  quantification 
[Wiederhold:02]. 

Added  value  for  applications  is  generated  when  knowledge  can  be  applied  to 
projects  outside  of  the  computer  science  community.  A  major  test  of  today's  capabilities 
was  the  Halo  Project.  [FriedlandEa:04]  [FriedlandEa:05].  Participants  from  Karlsruhe, 
using  simple  deductive  inferencing,  were  able  to  compete  effectively  using  fewer 
resources  and  less  time.  The  approach  used  by  their  Ontoprise  system  required  far  less 
tuning  and  narrowing  of  the  knowledge  bases  than  approaches  used  by  others 
[AngeleMOS  W :  03  ] . 
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Testing  Resources 

For  testing  purposes,  we  have  made  a  large  and  densely  linked  XML  file  of 
Movies,  their  directors,  actors,  casts,  and  remakes  (for  deeper  inferencing)  available 
[WiederholdA:2004].  This  material  could  be  converted  to  RDF,  and  provide  a  more 
complex  setting  than  the  bibliographic  files  now  in  common  use.  A  tool,  XLint,  was 
developed  to  syntactic  report  errors  in  XML  files  to  allow  bulk  repairs  of  systematic 
errors  to  proceed  rapidly  [ArguelloCW:  04],  Systematic  errors  will  occur  when  converting 
large  HTML  data  collections  to  XML,  because  of  the  strict  well-formedness  constraints 
imposed  by  XML. 

The  OntoAgents  project  also  supported  a  RDF-encoding  of  Wordnet  1.6 
[MelnikD:01].  This  RDF  resources  is  an  input  to  the  W3C  Best  Practices  Working 
Group. 

Resolving  Heterogeneity 

An  issue  of  concern  is  that  as  the  web  grows,  many  ontologies  will  evolve,  exacerbating 
issues  of  scalability  [BozsakEa:02]  [VolzDC:03].  When  applications  require  information 
from  multiple  autonomous  sources,  we  cannot  expect  a  common  ontology,  since  a  joint 
or  global  ontology  would  hinder  growth  and  effectiveness  in  narrow  domains 
[Wiederhold:03].  The  differences  may  be  minor,  but  their  import  is  hard  to  assess  by 
users,  unless  tools  are  made  available  [MaedcheS:02]. 

Resolving  semantic  heterogeneity  among  knowledge  and  data  resources  has  been 
an  issue  of  research  at  Stanford  for  some  time  [Wiederhold:94]  [Wiederhold:00] 
[MitraWK:00]  [Melnik:00].  Our  concepts  focus  on  an  Ontology  algebra.  Our  focus 
within  the  OntoAgents  project  has  been  on  the  articulation  of  pairs  of  ontologies  using 
semi-automated  methods  [MitraWK:00]  [MitraWD:01]  [WiederholdiO  1  ]  [MitraW:02] 
[Mitra:04]  [MitraW:04], 

Each  articulation  can  focus  on  a  specific  application,  and  becomes  easier  to 
maintain  and  manage.  An  initial  phase  suggests  articulation  rules,  containing  candidate 
matches  for  interoperation.  When  validated  by  the  interoperation  expert  they  enter  an 
application-focused  repository.  During  the  operational  phase  interoperation  among 
resources  described  by  those  articulation  rules  can  proceed  automatically. 

Some  related  work  at  Stanford  is  quite  formal,  but  has  provided  important 
background  [McCarthy:93]  [Buvac:04]. 

Portals 

Access  to  knowledge  and  information  is  provided  through  portals,  the  desktop  interfaces 
used  by  the  public  to  interact  with  the  web  [DeckerF:04].  Consistency  and 
maintainability  demands  that  those  portals  are  driven  by  ontologies  [MaedchcSSS.'0 1  ] 
[JinDW:01]  [JinX:02]  [StaabSSV:02]  [MaedcheSSSS:03]  [MaedcheSSSV:02] 

[HartmannS :04],  While  promising,  the  Stanford  effort  on  ontology-based  assistance  in  the 
construction  of  portals  was  only  brought  to  a  prototype  stage. 
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Web  services 


Obtaining  actionable  information  from  the  web  services  is  the  end  objective  envisaged  in 
OntoAgents,  as  well  as  inn  the  entire  DAML  effort  [MaedcheNS:03] 

The  business  model  of  web  services  is  just  now  being  established.  It  is  unclear  how 
these  services  will  be  supported  in  the  long  term,  by  the  sale  of  associated  products,  by 
advertising,  by  volunteer  efforts,  or  by  public  funds,  but  it  will  likely  be  a  combination  of 
all  of  these  [AgarwalHS:03].  When  the  product  of  the  web  service  is  information,  as  now 
kept  in  databases,  subscription  models  are  common,  but  reduce  flexibility.  Interaction 
with  the  database  [AngeleMOSW:03]  [DeckerK:03]  and  digital  library  [MelnikGP:00] 
[LarsenEa:03]  [Wiederhold:03a]  [Wiederhold:03b]  communities  is  important  for 
management  of  content. 

The  lack  of  experience  with  semantic  web  operations  makes  it  difficult  to  formalize  a 
business  model,  even  though  business-oriented  metrics  will  be  essential  to  gain  support 
[Wiederhold:05]. 

Results  and  Discussion 

We  cite  here  the  web  sites  that  contain  results  from  the  Ontoagents  Peoject.  The 
References  cite  a  large  number  of  publications  where  the  Onto  Agents  project  provided 
some  input  or  relevance.  The  modest  investment  in  the  ongoing  work  at  the  University  of 
Karlsruhe  was  especially  productive.  Not  all  of  the  papers  listed  in  the  references  are 
cited  in  the  descriptive  text  above.  A  number  of  workshops  were  organized  as  well. 

Websites 

Information  about  OntoAgents,  its  products,  and  related  research  is  available  at 

http  ://www-db .  Stanford,  edu/  Onto  Agents/  =  OntoAgents  abstracts  only  [Decker] 

http://www.semanticweb.org/  =  General  web  site,  not  updated  since  June  2003  [Decker  et 
al.] 

http  ://annotation .  semantic  web .  or  g/  =  Web  site  dedicated  to  semantic  annotation 
[Handschuh] 

http://proiects.semwebcentral.org/proiects/ontomat/  =  Project  page  and  cvs  repository  of 
Ontomat  OWL/RDF  semantic  annotation  tool ,  available  under  the  GNU  Lesser 
General  Public  License  (LGPL)  [Handschuh] 

http://proiects.semwebcentral.org/proiects/owlcrawler/  =  Project  page  and  cvs  repository 
of  OWL/RDF  or  FOAF  crawler  [Handschuh] 

http://www.aifb.uni-karlsruhe.de/about.html  =  The  SSEAL  portal  at  the  AIFB  Karlsruhe. 

http://www-db.stanford.edu/SKC/index.html  =  Predecessor  project  on  Semantic 
Interoperation  [Mitra,  Wiederhold] 

http://www.aifb.uni-karlsruhe.de/WBS/sha  =  Ontology  development  [Handschuh] 
http  ://protege.  Stanford.  edu/plugins/rdE  =  Protege  RDF  backend  plugin  [Fergerson] 


http://www-db.stanford.edu/OntoAgents/xlint/index.html  =  Xlint  processor  [Arguello] 

http://www.dfki.uni-kl.de/lfodo/triple  and  http://triple.semanticweb.org  =  TRIPLE 
inference  engine  [Decker  and  Sintek] 

http ://www. ontoweb .org/ download/ deliverables/D2 1  F inal-fmal.pdf  =  Scenarios  [Leger 
et  ah] 

http://edutella.jxta.org/  =  RDF-based  Metadata  Infrastructure  for  P2P  Applications 
(PADFR/Edutella) 

Conclusions 

This  Section  represents  my  personal  observation  on  three  topics,  and  reflects  in  no  way 
the  work  and  opinions  of  other  DAME  or  Onto  Agent  project  participants.  I  have 
received  some  valuable  feedback  from  OntoAgents  researchers.  Since  my  participation 
diminished  greatly  after  my  retirement  I  will  not  be  aware  of  all  advances  made  since 
then.  So,  if  issues  I  list  below  have  been  overcome,  congratulations! 

The  DAML  project  was  initiated  at  the  birth  of  the  semantic  web.  It  contributed  greatly  to 
define  a  new  research  area,  but,  because  of  its  novelty,  also  had  to  depend  on  researchers 
that  had  been  active  earlier  in  other  computer  science  settings.  As  a  result,  some  tradeoffs 
to  bring  the  semantic  web,  as  envisaged  here  into  practical  real-world  use,  have  not  been 
established  as  well  as  the  need  to  be. 

Robustness. 

Acceptance  of  RDF  or  similar  representations  is  today  a  major  barrier  for  users  outside  of 
academia,  who  today  are  still  fighting  XML  and  its  requirements.  In  reviewing  web 
technology  we  observe  a  trend. 

The  acceptance  of  HTML  was  enabled  by  the  robustness  of  the  browsers.  Even 
today  many  HTML  page  on  the  web  have  syntactic  and  content  errors,  but  they  remain 
human-understandable,  and  can  also  be  adequately  processed  by  search  engines 
screenscrapers.  However,  a  single  syntactic  error  in  an  XML  document  typically  prevents 
access  to  all  subsequent  information.  Such  a  punitive  interpretation  is  discouraging. 

RDF  seems  to  be  no  better.  It  is  unclear  to  what  extent  the  problem  can  be  addressed  by 
improving  the  representation  versus  adapting  the  interpreters.  Some  settings  of  the 
semantic  web  indeed  require  completeness  and  the  attendant  cost  to  attain  perfection;  but 
many  do  not.  When  searching  a  hotel  I  am  happy  with  a  dozen  choices,  any  more  creates 
overload,  it  is  unlikely  that  the  13th  hotel  choice,  not  shown  properly  because  of  a  syntax 
error,  would  significantly  change  my  decision.  If  that  hotel  entry  had  been  early  in  an 
XML  list,  however,  I  would  have  failed  to  see  all  of  the  remainder.  Can  the  expected 
perfection  become  a  parameter? 

Automatic  annotation 

Annotation  is  crucial  to  the  concept  of  the  semantic  web,  but  also  time- 
consuming.  There  has  been  much  research  here,  but  I  have  not  yet  seen  any  public 
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business  webpages  that  were  annotated  using  such  tools.  Without  applications  that  allow 
the  providers  to  profit  from  the  annotations,  there  is  little  benefit  and  actually  some  risk 
of  misuse  of  annotations.  Webpages  used  to  improve  internal  knowledge  management 
can,  and  are  profitably  annotated  in  some  organizations. 

For  legacy  web  pages  automatic  assistance  for  annotation  is  essential  and  must  be 
convenient  rather  than  perfect.  The  first  round  provided  by  automation  should  be  easy, 
maybe  even  invisible  to  the  users.  Its  output  should  allow  convenient  refinement,  by 
humans  as  well  as  tools.  That  will  likely  require  tracking  of  the  provenance  of 
annotations,  so  we  don't  repeat  the  validation  problems  now  encountered  in  the  genome 
project. 

New  technologies  are  emerging  that  provide  annotations  as  the  data  are  entered. 
Interoperation  of  those  annotations  will  require  that  those  technologies  use  the  same 
ontologies;  or  that  the  ontologies  themselves  become  interoperable.  There  are  justifiable 
barriers  to  sharing  ontologies  at  the  level  of  the  creators  of  the  data,  that  will  not  be 
overcome  by  presenting  a  vision  of  a  grand  future  [Wiederhold:02]  [Wiederhold:03a],  If 
there  are  inadequate  benefits  compared  to  the  costs  for  the  information  generator,  then  the 
imposition  of  external  expertise,  supported  by  the  users  that  benefit,  has  to  be  enabled. 

One  problem  is  that  an  optimal  ontology  for  one  application  category,  as  geo¬ 
coding  for  photographic  images  (FOAF),  is  not  likely  to  be  effective  for  geo-coding  of 
Marine  Corps  logistic  destinations  and  interchange  points  [Berg:03]. 

Any  annotation  must  be  viewable,  else  no  feedback  will  be  generated  by  owners 
and  users.  If  annotations  remain  disjoint,  (obsolete)  computer-science  principles  may  be 
served,  but  failures  due  to  annotation  errors  will  remain  mysteries.  The  lack  of  integration 
of  annotation  and  viewable  content  is  a  major  discouragement  in  current 
implementations. 

Recommendation 

For  dissemination  of  DAML  and  successor  results,  the  potential  customers  of  those 
results  need  to  see  the  effectiveness  of  research  products  in  an  easy-to-perceive  and 
relatively  unbiased  manner.  Having  some  publicly  available,  realistic  and  compelling 
scenarios  will  also  focus  semantic  web  research,  since  they  can  be  used  by  the 
community  to  test  their  work,  This  suggestions  is  not  original,  and  was  widely  discussed 
in  2002,  when  it  was  obvious  that  using  the  DAML  machinery  merely  to  conclude  that 
"Mary  is  the  parent  of  Bill"  was  not  compelling  [Pease:02]  [Brachman:02]. 

There  was  a  nice  scenario  in  the  Berners-Lee,  Hendler  and  Lassila  Scientific 
American  Article,  but  I  have  not  seen  it  actually  demonstrated.  That  scenario  is  quite 
ambitious,  and  depends  too  much  on  resources  that  do  not  exist  today.  Other  example 
scenarios  have  been  listed  on  DAML  participant  reports,  but  not  worked  out,  as  far  as  I 
know,  to  provide  a  sharable  set  of  test  cases.  The  European  OntoWeb  Project  lists  21 
'Successful  Scenarios'  of  Semantic  Web  technology,  but  none  is  documented  yet  to  the 
level  that  it  can  be  used  as  a  test  case  for  measuring  semantic  web  technology  progress 
and  innovation. 
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The  relevant  site  data  also  have  to  be  available.  The  Halo  project  provided  that  basis, 
in  the  area  of  answering  questions  on  High-school  level  Chemistry.  Its  creation 
comprised  much  of  the  cost  of  the  Halo  project.  The  DARPA  community  did  use 
scenarios  in  the  prior  HPKB  project  and  provided  data  for  participants  in  its  TREC 
efforts.  The  Database  community  now  has  its  standard  transaction  streams  used  to  assess 
progress. 

Having  standard  scenarios,  of  varying  types,  with  substantial  data  ,will  allow  the 
community  to  assess  open  issues,  as  the  tradeoff  among  formality  and  scruffiness  needed 
in  semantic  web  engines,  and  the  failure  rates  and  performance  issues  faced  by  alternate 
logics. 
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